Introduction/Business problem

Busan, formerly romanized as Pusan and now officially Busan Metropolitan City, is South Korea's second-most populous city after Seoul, with a population of over 3.5 million inhabitants.[3] It is the economic, cultural and educational center of southeastern South Korea, with its port—Korea's busiest and the fifth-busiest in the world [a]—only about 190 kilometers (120 mi) from the Japanese islands of Kyushu and Honshu. The surrounding "Southeast Economic Zone" (including Ulsan and South Gyeongsang) is South Korea's largest industrial area.

However, Busan is a city unknown to foreigners yet. So, I wolud like to provide useful materials for tourists who want to travel to new cities in Korea or for business travelers.

Description of the data

I used following data for this project.

Administrative divisions

Datasource: https://en.wikipedia.org/wiki/Busan
Description: Administrative divisions Table with 16 subdivisions and population of each district

Map data of Busan:

Data source: http://www.gisdeveloper.co.kr/?p=2332
Description : Shapefile of South Korea to extract the geojson of Busan

Venues in each subdivision of Busan:

Data source: Foursquare APIs
Description: All the venues in each subdivision

Methodology

Scrap data from wikidipia

First of all, I scraped data from Wikipedia to create a dataframe with the city districts of Busan.
https://en.wikipedia.org/wiki/Busan
and, transformed the data into a dataframe containing name of the 16 subdivisions, Area and population.

In [79]:
busan_data
Out[79]:
Subdivision Area Population
0 Buk-gu 39.36 303955
1 Busanjin-gu 29.70 372922
2 Dong-gu 9.73 90668
3 Dongnae-gu 16.63 271350
4 Gangseo-gu 181.50 123636
5 Geumjeong-gu 65.27 249054
6 Haeundae-gu 51.47 417174
7 Jung-gu 2.83 45821
8 Nam-gu 26.81 278681
9 Saha-gu 41.75 337423
10 Sasang-gu 36.09 233443
11 Seo-gu 13.93 111906
12 Suyeong-gu 10.21 181526
13 Yeongdo-gu 14.15 124918
14 Yeonje-gu 12.08 207396
15 Gijang-gun 218.32 164546

Shapefile (SHP data of Busan

Nest step was to get the shapefile (SHP) of South Korea and extract the geojson of Busan by using QGIS 3 (https://www.qgis.org/)

Left: Entire Soth Korea
Right: After filtering out

Geojson to goeDataFrame

Then, I used the geopands to transform geojson into geodataframe. This geodataframe and the dataframre from wikidipia are merged into one geodataframe.
The geodataframe contains geometry data for each subdivision.

In [81]:
busan_data
Out[81]:
Subdivision Korean geometry Area Population
0 Jung-gu 중구 MULTIPOLYGON (((129.03231 35.11643, 129.03235 ... 2.83 45821
1 Seo-gu 서구 MULTIPOLYGON (((129.01542 35.04808, 129.01515 ... 13.93 111906
2 Dong-gu 동구 MULTIPOLYGON (((129.04264 35.14589, 129.04327 ... 9.73 90668
3 Yeongdo-gu 영도구 MULTIPOLYGON (((129.09320 35.03771, 129.09324 ... 14.15 124918
4 Busanjin-gu 부산진구 MULTIPOLYGON (((129.04001 35.19981, 129.04033 ... 29.70 372922
5 Dongnae-gu 동래구 MULTIPOLYGON (((129.07905 35.22509, 129.07910 ... 16.63 271350
6 Nam-gu 남구 MULTIPOLYGON (((129.12702 35.09096, 129.12697 ... 26.81 278681
7 Buk-gu 북구 MULTIPOLYGON (((128.98774 35.20145, 128.98774 ... 39.36 303955
8 Haeundae-gu 해운대구 MULTIPOLYGON (((129.13898 35.15860, 129.13963 ... 51.47 417174
9 Saha-gu 사하구 MULTIPOLYGON (((128.95633 34.88970, 128.95609 ... 41.75 337423
10 Geumjeong-gu 금정구 MULTIPOLYGON (((129.10621 35.30646, 129.10640 ... 65.27 249054
11 Gangseo-gu 강서구 MULTIPOLYGON (((128.77648 35.01114, 128.77625 ... 181.50 123636
12 Yeonje-gu 연제구 MULTIPOLYGON (((129.07817 35.19945, 129.07827 ... 12.08 207396
13 Suyeong-gu 수영구 MULTIPOLYGON (((129.11682 35.18344, 129.11689 ... 10.21 181526
14 Sasang-gu 사상구 MULTIPOLYGON (((128.99094 35.19385, 128.99121 ... 36.09 233443
15 Gijang-gun 기장군 MULTIPOLYGON (((129.22953 35.21716, 129.22938 ... 218.32 164546

Map rendring

Using the folium package and the geodataframe, I then created a map of the 16 subdivisions with population.

In [10]:
busan_map = build_map(busan_data, 'default')
tooltip = build_tooltip(busan_map, busan_data, ['Subdivision', 'Population'], ['Subdivision: ','Population: '])

busan_map
C:\Users\JunyoungHwang\anaconda3\lib\site-packages\folium\folium.py:415: FutureWarning: The choropleth  method has been deprecated. Instead use the new Choropleth class, which has the same arguments. See the example notebook 'GeoJSON_and_choropleth' for how to do this.
  FutureWarning
Out[10]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Using Foursquare Location Data

At Last, I used each subdivision name for foursquare API to get all of venues in Busan and 464 venues were returned
The quary string is:
e.g) near="수영구, 부산광역시, 대한민국"

In [85]:
busan_venues = getNearbyVenues(subdivision=busan_data['Subdivision'], 
                               korean=busan_data['Korean'])
print(busan_venues.shape)
busan_venues.head()
(463, 5)
Out[85]:
Subdivision Venue Venue Latitude Venue Longitude Venue Category
0 Jung-gu 이재모피자 35.102056 129.030717 Pizza Place
1 Jung-gu 화국반점 35.102683 129.034004 Chinese Restaurant
2 Jung-gu Golmok Gejang (골목게장) 35.108443 129.038069 Korean Restaurant
3 Jung-gu Starbucks (스타벅스) 35.105049 129.036298 Coffee Shop
4 Jung-gu Busan Tower (부산타워) 35.100929 129.032514 Scenic Lookout

The higest number of venues in Busan

I plotted a bar chart with the top 10 highest number of venues in the whole city. We can see that Coffee shop, Korean Restaurant and Fast Food Restaurant are the top 3 within 84 unique venues in Busan.

In [86]:
venues = busan_venues.groupby('Venue Category').count().sort_values(by='Venue', ascending=False).head(10)
venues.plot.bar(y="Venue", use_index=True, rot=70, title="Top 10 highest number of venues in Busan", figsize=(15,5));
In [87]:
print('There are {} uniques categories.'.format(len(busan_venues['Venue Category'].unique())))
There are 84 uniques categories.

To find clusters of city districts, I create a data-frame with pandas one hot encoding for the venue categories.

In [90]:
print(busan_onehot.shape)
busan_onehot.head()
(463, 85)
Out[90]:
Subdivision Airport Airport Lounge Art Gallery Asian Restaurant BBQ Joint Bakery Bar Baseball Field Beach ... Spa Steakhouse Supermarket Sushi Restaurant Theme Park Toll Plaza Trail Train Station Turkish Restaurant Used Bookstore
0 Jung-gu 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 Jung-gu 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 Jung-gu 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 Jung-gu 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 Jung-gu 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 85 columns

I used this information to create a data frame in which you can see the most common restaurant venue types for each city district.

In [95]:
subdivision_venues_sorted
Out[95]:
Subdivision 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 Buk-gu Bakery Coffee Shop Fast Food Restaurant Korean Restaurant Ice Cream Shop Park Multiplex Market Dumpling Restaurant Hotel
1 Busanjin-gu Coffee Shop Korean Restaurant Italian Restaurant Noodle House BBQ Joint Hotel Park Concert Hall Multiplex Department Store
2 Dong-gu Korean Restaurant Gukbap Restaurant Hotel Café Coffee Shop Chinese Restaurant Scenic Lookout Park Trail Bakery
3 Dongnae-gu Coffee Shop Korean Restaurant Supermarket Hotel Market Department Store Fried Chicken Joint Big Box Store Noodle House Dumpling Restaurant
4 Gangseo-gu Supermarket Fast Food Restaurant Korean Restaurant Coffee Shop Airport Lounge Ice Cream Shop Airport Racecourse Hotel Duty-free Shop
5 Geumjeong-gu Coffee Shop Café Korean Restaurant Bus Station Supermarket Fast Food Restaurant Turkish Restaurant Department Store Bus Line Moroccan Restaurant
6 Gijang-gun Coffee Shop Seafood Restaurant Golf Course Korean Restaurant Park Beach Cemetery Market Monument / Landmark Outlet Mall
7 Haeundae-gu BBQ Joint Korean Restaurant Hotel Coffee Shop Bar Seafood Restaurant Hostel Buffet Dumpling Restaurant Japanese Restaurant
8 Jung-gu Coffee Shop Korean Restaurant Hotel Market Park Used Bookstore Public Art Japanese Restaurant Department Store Multiplex
9 Nam-gu Ice Cream Shop Fast Food Restaurant Gukbap Restaurant Korean Restaurant Bakery Coffee Shop Seafood Restaurant Concert Hall Japanese Restaurant Ramen Restaurant
10 Saha-gu Coffee Shop Bakery Fast Food Restaurant Ice Cream Shop Food Court Beach Café Outlet Store Metro Station Fried Chicken Joint
11 Sasang-gu Supermarket Korean Restaurant Coffee Shop Fast Food Restaurant Airport Lounge Airport Park Hotel Ice Cream Shop Duty-free Shop
12 Seo-gu Coffee Shop Korean Restaurant BBQ Joint Market Used Bookstore Noodle House Chinese Restaurant Fried Chicken Joint Pizza Place Plaza
13 Suyeong-gu Coffee Shop Seafood Restaurant Korean Restaurant BBQ Joint Bar Brewery Park Multiplex Performing Arts Venue Japanese Restaurant
14 Yeongdo-gu Korean Restaurant Beach Scenic Lookout Trail Café Chinese Restaurant Coffee Shop Pier Deli / Bodega Port
15 Yeonje-gu Coffee Shop Fast Food Restaurant Supermarket Intersection Bakery Baseball Field Metro Station Donut Shop Outdoor Sculpture Multiplex

I used prescriptive analytics to help a traveler decide a location to visit. I will use clustering (KMeans).
Data is normalized with StandardScaler and Silhouette Method helps find out the opitmal K.
The optimal K is 4

In [98]:
busan_grouped_clustering = busan_grouped.drop('Subdivision', 1)
Clus_dataSet = StandardScaler().fit_transform(busan_grouped_clustering)

distortion = []

for k in range(2, 6):
    kmeans = KMeans(n_clusters = k).fit(Clus_dataSet)  
    predict = kmeans.fit_predict(Clus_dataSet)
    distortion.append(silhouette_score(Clus_dataSet, predict, metric = 'euclidean'))

plt.plot(range(2, 6), distortion)
plt.title('Silhouette Method')
plt.xlabel('Number of clusters')
plt.show()

Result and Conclusion

Here is the result of K-mean clustring with the K value.
What we see in the table are the city districts and their most common venues, and they now have been assigned 4 different cluster labels.
As a result, the Coffee Shop and Korean Restaurant can be found all over Busan.

In [100]:
busan_merged.head()
Out[100]:
Subdivision Korean geometry Area Population Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 Jung-gu 중구 MULTIPOLYGON (((129.03231 35.11643, 129.03235 ... 2.83 45821 0 Coffee Shop Korean Restaurant Hotel Market Park Used Bookstore Public Art Japanese Restaurant Department Store Multiplex
1 Seo-gu 서구 MULTIPOLYGON (((129.01542 35.04808, 129.01515 ... 13.93 111906 0 Coffee Shop Korean Restaurant BBQ Joint Market Used Bookstore Noodle House Chinese Restaurant Fried Chicken Joint Pizza Place Plaza
2 Dong-gu 동구 MULTIPOLYGON (((129.04264 35.14589, 129.04327 ... 9.73 90668 0 Korean Restaurant Gukbap Restaurant Hotel Café Coffee Shop Chinese Restaurant Scenic Lookout Park Trail Bakery
3 Yeongdo-gu 영도구 MULTIPOLYGON (((129.09320 35.03771, 129.09324 ... 14.15 124918 0 Korean Restaurant Beach Scenic Lookout Trail Café Chinese Restaurant Coffee Shop Pier Deli / Bodega Port
4 Busanjin-gu 부산진구 MULTIPOLYGON (((129.04001 35.19981, 129.04033 ... 29.70 372922 0 Coffee Shop Korean Restaurant Italian Restaurant Noodle House BBQ Joint Hotel Park Concert Hall Multiplex Department Store

Cluster 0 - Hotels and Markets are the common venues in this districts (except Coffee Shop and Korean Restaurant)

In [73]:
busan_merged.loc[busan_merged['Cluster Labels'] == 0, busan_merged.columns[[0] + list(range(5, busan_merged.shape[1]))]]
Out[73]:
Subdivision Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 Jung-gu 0 Coffee Shop Korean Restaurant Hotel Market Park Used Bookstore Public Art Japanese Restaurant Department Store Multiplex
1 Seo-gu 0 Coffee Shop Korean Restaurant BBQ Joint Market Used Bookstore Noodle House Chinese Restaurant Fried Chicken Joint Pizza Place Plaza
2 Dong-gu 0 Korean Restaurant Gukbap Restaurant Hotel Café Coffee Shop Chinese Restaurant Scenic Lookout Park Trail Bakery
4 Busanjin-gu 0 Coffee Shop Korean Restaurant Italian Restaurant Noodle House BBQ Joint Hotel Park Concert Hall Multiplex Department Store
5 Dongnae-gu 0 Coffee Shop Korean Restaurant Supermarket Hotel Market Department Store Fried Chicken Joint Big Box Store Noodle House Dumpling Restaurant
8 Haeundae-gu 0 BBQ Joint Korean Restaurant Hotel Coffee Shop Bar Seafood Restaurant Hostel Buffet Dumpling Restaurant Japanese Restaurant

Cluster 1 - Supermarket and Fast Food Restaurant are the common venues in this districts (except Coffee Shop and Korean Restaurant)

In [74]:
busan_merged.loc[busan_merged['Cluster Labels'] == 1, busan_merged.columns[[0] + list(range(5, busan_merged.shape[1]))]]
Out[74]:
Subdivision Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
6 Nam-gu 1 Ice Cream Shop Fast Food Restaurant Gukbap Restaurant Korean Restaurant Bakery Coffee Shop Seafood Restaurant Concert Hall Japanese Restaurant Ramen Restaurant
7 Buk-gu 1 Bakery Coffee Shop Fast Food Restaurant Korean Restaurant Ice Cream Shop Park Multiplex Market Dumpling Restaurant Hotel
9 Saha-gu 1 Coffee Shop Bakery Fast Food Restaurant Donut Shop Ice Cream Shop Food Court Beach Café Outlet Store Metro Station
10 Geumjeong-gu 1 Coffee Shop Café Korean Restaurant Bus Station Supermarket Fast Food Restaurant Turkish Restaurant Department Store Bus Line Moroccan Restaurant
11 Gangseo-gu 1 Supermarket Fast Food Restaurant Korean Restaurant Coffee Shop Airport Lounge Ice Cream Shop Airport Racecourse Hotel Duty-free Shop
12 Yeonje-gu 1 Coffee Shop Fast Food Restaurant Supermarket Intersection Bakery Baseball Field Metro Station Donut Shop Outdoor Sculpture Multiplex
13 Suyeong-gu 1 Coffee Shop Seafood Restaurant Korean Restaurant BBQ Joint Bar Brewery Park Multiplex Performing Arts Venue Japanese Restaurant
14 Sasang-gu 1 Supermarket Korean Restaurant Coffee Shop Fast Food Restaurant Airport Lounge Airport Park Hotel Ice Cream Shop Duty-free Shop

Cluster 2 - Beach and Scenic Lookout are the common venues in this districts (except Coffee Shop and Korean Restaurant)

In [75]:
busan_merged.loc[busan_merged['Cluster Labels'] == 2, busan_merged.columns[[0] + list(range(5, busan_merged.shape[1]))]]
Out[75]:
Subdivision Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
3 Yeongdo-gu 2 Korean Restaurant Beach Scenic Lookout Trail Café Chinese Restaurant Coffee Shop Pier Deli / Bodega Port

Cluster 3 - Seafood Restaurant and Golf Course are the common venues in this districts (except Coffee Shop and Korean Restaurant)

In [76]:
busan_merged.loc[busan_merged['Cluster Labels'] == 3, busan_merged.columns[[0] + list(range(5, busan_merged.shape[1]))]]
Out[76]:
Subdivision Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
15 Gijang-gun 3 Coffee Shop Seafood Restaurant Golf Course Korean Restaurant Park Beach Cemetery Market Monument / Landmark Outlet Mall

This map shows the city districts with a cluster-specific color and markers of each venues.

In [102]:
busan_map
Out[102]:
Make this Notebook Trusted to load map: File -> Trust Notebook